Kalman Filter

Implementation of Kalman filters using pytorch and parameter optimizations with gradient descend

Introduction

The models uses a latent state variable $x$ that is modelled over time, to impute gaps in $y$

The assumption of the model is that the state variable at time $x_t$ depends only on the last state $x_{t-1}$ and not on the previous states.

Equations

The equations of the model are:

\[\begin{align} p(x_t | x_{t-1}) & = \mathcal{N}(Ax_{t-1} + b, Q) \\ p(y_t | x_t) & = \mathcal{N}(Hx_t + d, R) \end{align}\]

where:

$A$ is the A
$b$ is the bset
$Q$ is the Q
$H$ is the obs_trans
$d$ is the d
$R$ is the R

in addition the model has also the parameters of the initial state that are used to initialize the filter:

m0
P0

The Kalman filter has 3 steps:

filter (updating the state at time t with observations till time t-1)
update (update the state at time t using the observation at time t)
smooth (update the state using the observations at time t+1)

In case of missing data the update step is skipped.

After smoothing the missing data at time t ($y_t$) can be imputed from the state ($x_t$) using this formula: \[p(y_t|x_t) = \mathcal{N}(Hx_t + d, R + HP^s_tH^T)\]

The Kalman Filter is an algorithm designed to estimate $P(x_t | y_{0:t})$. As all state transitions and obss are linear with Gaussian distributed noise, these distributions can be represented exactly as Gaussian distributions with mean ms[t] and covs Ps[t]. Similarly, the Kalman Smoother is an algorithm designed to estimate $P(x_t | y_{0:t-1})$

Kalman Filter Base

Utils

show_as_row(_add_batch_dim(torch.ones(2)), _add_batch_dim(torch.ones(2,2)), _add_batch_dim(torch.ones(2,2,2)))

tensor([[[1.],
         [1.]]])

tensor([[[1., 1.],
         [1., 1.]]])

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])

show_as_row(_add_batch_dim(torch.ones(2)).shape, _add_batch_dim(torch.ones(2,2)).shape, _add_batch_dim(torch.ones(2,2,2)).shape)

torch.Size([1, 2, 1])

torch.Size([1, 2, 2])

torch.Size([2, 2, 2])

Kalman Filter Base

source

KalmanFilterBase

 KalmanFilterBase (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
                   Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
                   d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
                   n_dim_state:int=None, n_dim_obs:int=None,
                   n_dim_contr:int=None,
                   var_names:Optional[Iterable[str]]=None,
                   contr_names:Optional[Iterable[str]]=None,
                   cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
                   use_conditional:bool=False, use_control:bool=True,
                   use_smooth:bool=True, pred_only_gap:bool=False,
                   pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

	Type	Default	Details
A	Tensor		[n_dim_state,n_dim_state] $A$, state transition matrix
H	Tensor		[n_dim_obs, n_dim_state] $H$, observation matrix
B	Tensor		[n_dim_state, n_dim_contr] $B$ control matrix
Q	Tensor		[n_dim_state, n_dim_state] $Q$, state trans covariance matrix
R	Tensor		[n_dim_obs, n_dim_obs] $R$, observations covariance matrix
b	Tensor		[n_dim_state] $b$, state transition offset
d	Tensor		[n_dim_obs] $d$, observations offset
m0	Tensor		[n_dim_state] $m_0$
P0	Tensor		[n_dim_state, n_dim_state] $P_0$
n_dim_state	int	None	Number of dimensions for state - default infered from parameters
n_dim_obs	int	None	Number of dimensions for observations - default infered from parameters
n_dim_contr	int	None	Number of dimensions for control - default infered from parameters
var_names	Optional	None	Names of variables for printing
contr_names	Optional	None	Names of control variables for printing
cov_checker	meteo_imp.gaussian.CheckPosDef \| None	None	Check covariance at every step
use_conditional	bool	False	Use conditional distribution for gaps that don’t have all variables missing
use_control	bool	True	Use the control in the filter
use_smooth	bool	True	Use smoother for predictions (otherwise is filter only)
pred_only_gap	bool	False	it True predictions are only for the gap
pred_std	bool	False	return only stds and not covariances

Constructors

Giving all the parameters manually to the KalmanFilterBase init method is not convenient, hence we are having some methods that help initize the class

due to a bug in fastcore cannot subclass after creating class methods

source

KalmanFilter

 KalmanFilter (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
               Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
               d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
               n_dim_state:int=None, n_dim_obs:int=None,
               n_dim_contr:int=None,
               var_names:Optional[Iterable[str]]=None,
               contr_names:Optional[Iterable[str]]=None,
               cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
               use_conditional:bool=False, use_control:bool=True,
               use_smooth:bool=True, pred_only_gap:bool=False,
               pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

	Type	Default	Details
A	Tensor		[n_dim_state,n_dim_state] $A$, state transition matrix
H	Tensor		[n_dim_obs, n_dim_state] $H$, observation matrix
B	Tensor		[n_dim_state, n_dim_contr] $B$ control matrix
Q	Tensor		[n_dim_state, n_dim_state] $Q$, state trans covariance matrix
R	Tensor		[n_dim_obs, n_dim_obs] $R$, observations covariance matrix
b	Tensor		[n_dim_state] $b$, state transition offset
d	Tensor		[n_dim_obs] $d$, observations offset
m0	Tensor		[n_dim_state] $m_0$
P0	Tensor		[n_dim_state, n_dim_state] $P_0$
n_dim_state	int	None	Number of dimensions for state - default infered from parameters
n_dim_obs	int	None	Number of dimensions for observations - default infered from parameters
n_dim_contr	int	None	Number of dimensions for control - default infered from parameters
var_names	Optional	None	Names of variables for printing
contr_names	Optional	None	Names of control variables for printing
cov_checker	meteo_imp.gaussian.CheckPosDef \| None	None	Check covariance at every step
use_conditional	bool	False	Use conditional distribution for gaps that don’t have all variables missing
use_control	bool	True	Use the control in the filter
use_smooth	bool	True	Use smoother for predictions (otherwise is filter only)
pred_only_gap	bool	False	it True predictions are only for the gap
pred_std	bool	False	return only stds and not covariances

source

KalmanFilterSR

 KalmanFilterSR (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
                 Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
                 d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
                 n_dim_state:int=None, n_dim_obs:int=None,
                 n_dim_contr:int=None,
                 var_names:Optional[Iterable[str]]=None,
                 contr_names:Optional[Iterable[str]]=None,
                 cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
                 use_conditional:bool=False, use_control:bool=True,
                 use_smooth:bool=True, pred_only_gap:bool=False,
                 pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

	Type	Default	Details
A	Tensor		[n_dim_state,n_dim_state] $A$, state transition matrix
H	Tensor		[n_dim_obs, n_dim_state] $H$, observation matrix
B	Tensor		[n_dim_state, n_dim_contr] $B$ control matrix
Q	Tensor		[n_dim_state, n_dim_state] $Q$, state trans covariance matrix
R	Tensor		[n_dim_obs, n_dim_obs] $R$, observations covariance matrix
b	Tensor		[n_dim_state] $b$, state transition offset
d	Tensor		[n_dim_obs] $d$, observations offset
m0	Tensor		[n_dim_state] $m_0$
P0	Tensor		[n_dim_state, n_dim_state] $P_0$
n_dim_state	int	None	Number of dimensions for state - default infered from parameters
n_dim_obs	int	None	Number of dimensions for observations - default infered from parameters
n_dim_contr	int	None	Number of dimensions for control - default infered from parameters
var_names	Optional	None	Names of variables for printing
contr_names	Optional	None	Names of control variables for printing
cov_checker	meteo_imp.gaussian.CheckPosDef \| None	None	Check covariance at every step
use_conditional	bool	False	Use conditional distribution for gaps that don’t have all variables missing
use_control	bool	True	Use the control in the filter
use_smooth	bool	True	Use smoother for predictions (otherwise is filter only)
pred_only_gap	bool	False	it True predictions are only for the gap
pred_std	bool	False	return only stds and not covariances

Random parameters

kB = KalmanFilterBase.init_random(3,4, 3, dtype=torch.float64)
kB

Kalman Filter
        N dim obs: 3,
        N dim state: 4,
        N dim contr: 3

kB.Q

tensor([[[1.5725, 0.3829, 0.0843, 0.3637],
         [0.3829, 1.4430, 0.3358, 1.1988],
         [0.0843, 0.3358, 1.7816, 0.7411],
         [0.3637, 1.1988, 0.7411, 1.6773]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)

kB.Q_C

tensor([[[1.2540, 0.0000, 0.0000, 0.0000],
         [0.3053, 1.1618, 0.0000, 0.0000],
         [0.0672, 0.2714, 1.3052, 0.0000],
         [0.2901, 0.9556, 0.3542, 0.7446]]], dtype=torch.float64,
       grad_fn=<DiagonalScatterBackward0>)

test_close(kB.Q_C @ kB.Q_C.mT, kB.Q, eps=2e-5)

kB.P0 = to_posdef(torch.rand(1,3,3))

check that assigment works :)

kB.P0 = to_posdef(torch.rand(4, 4, dtype=torch.float64))

kB.P0_C

tensor([[0.9349, 0.0000, 0.0000, 0.0000],
        [0.3928, 1.0748, 0.0000, 0.0000],
        [0.7406, 0.7533, 0.8326, 0.0000],
        [0.5903, 0.0391, 0.8217, 0.9638]], dtype=torch.float64,
       grad_fn=<DiagonalScatterBackward0>)

kB = KalmanFilterBase.init_random(3,4, 3, dtype=torch.float64)
kB

Kalman Filter
        N dim obs: 3,
        N dim state: 4,
        N dim contr: 3

list(kB.named_parameters())

[('A',
  Parameter containing:
  tensor([[[0.1751, 0.5375, 0.7676, 0.7450],
           [0.1204, 0.4777, 0.5823, 0.3786],
           [0.8484, 0.2317, 0.3969, 0.7088],
           [0.0270, 0.6178, 0.6097, 0.9128]]], dtype=torch.float64,
         requires_grad=True)),
 ('H',
  Parameter containing:
  tensor([[[0.4984, 0.1597, 0.4458, 0.7349],
           [0.1070, 0.9531, 0.1942, 0.6683],
           [0.9186, 0.7123, 0.1806, 0.5045]]], dtype=torch.float64,
         requires_grad=True)),
 ('B',
  Parameter containing:
  tensor([[[0.2827, 0.1138, 0.9378],
           [0.5594, 0.9364, 0.5136],
           [0.8592, 0.7647, 0.5183],
           [0.2376, 0.5618, 0.5096]]], dtype=torch.float64, requires_grad=True)),
 ('Q_raw',
  Parameter containing:
  tensor([[[0.4590, 0.0000, 0.0000, 0.0000],
           [0.5348, 0.5296, 0.0000, 0.0000],
           [0.0025, 0.0749, 0.4603, 0.0000],
           [0.0588, 0.5416, 0.9309, 0.5859]]], dtype=torch.float64,
         requires_grad=True)),
 ('R_raw',
  Parameter containing:
  tensor([[[0.1393, 0.0000, 0.0000],
           [0.0249, 0.7613, 0.0000],
           [0.7829, 0.4311, 0.9199]]], dtype=torch.float64, requires_grad=True)),
 ('b',
  Parameter containing:
  tensor([[[0.4661],
           [0.3918],
           [0.0571],
           [0.9529]]], dtype=torch.float64, requires_grad=True)),
 ('d',
  Parameter containing:
  tensor([[[0.9334],
           [0.5645],
           [0.0695]]], dtype=torch.float64, requires_grad=True)),
 ('m0',
  Parameter containing:
  tensor([[[0.8234],
           [0.3850],
           [0.3380],
           [0.4376]]], dtype=torch.float64, requires_grad=True)),
 ('P0_raw',
  Parameter containing:
  tensor([[[0.4467, 0.0000, 0.0000, 0.0000],
           [0.4818, 0.9502, 0.0000, 0.0000],
           [0.4770, 0.3478, 0.9812, 0.0000],
           [0.2460, 0.9602, 0.6476, 0.2647]]], dtype=torch.float64,
         requires_grad=True))]

kB.state_dict()

OrderedDict([('A',
              tensor([[[0.1751, 0.5375, 0.7676, 0.7450],
                       [0.1204, 0.4777, 0.5823, 0.3786],
                       [0.8484, 0.2317, 0.3969, 0.7088],
                       [0.0270, 0.6178, 0.6097, 0.9128]]], dtype=torch.float64)),
             ('H',
              tensor([[[0.4984, 0.1597, 0.4458, 0.7349],
                       [0.1070, 0.9531, 0.1942, 0.6683],
                       [0.9186, 0.7123, 0.1806, 0.5045]]], dtype=torch.float64)),
             ('B',
              tensor([[[0.2827, 0.1138, 0.9378],
                       [0.5594, 0.9364, 0.5136],
                       [0.8592, 0.7647, 0.5183],
                       [0.2376, 0.5618, 0.5096]]], dtype=torch.float64)),
             ('Q_raw',
              tensor([[[0.4590, 0.0000, 0.0000, 0.0000],
                       [0.5348, 0.5296, 0.0000, 0.0000],
                       [0.0025, 0.0749, 0.4603, 0.0000],
                       [0.0588, 0.5416, 0.9309, 0.5859]]], dtype=torch.float64)),
             ('R_raw',
              tensor([[[0.1393, 0.0000, 0.0000],
                       [0.0249, 0.7613, 0.0000],
                       [0.7829, 0.4311, 0.9199]]], dtype=torch.float64)),
             ('b',
              tensor([[[0.4661],
                       [0.3918],
                       [0.0571],
                       [0.9529]]], dtype=torch.float64)),
             ('d',
              tensor([[[0.9334],
                       [0.5645],
                       [0.0695]]], dtype=torch.float64)),
             ('m0',
              tensor([[[0.8234],
                       [0.3850],
                       [0.3380],
                       [0.4376]]], dtype=torch.float64)),
             ('P0_raw',
              tensor([[[0.4467, 0.0000, 0.0000, 0.0000],
                       [0.4818, 0.9502, 0.0000, 0.0000],
                       [0.4770, 0.3478, 0.9812, 0.0000],
                       [0.2460, 0.9602, 0.6476, 0.2647]]], dtype=torch.float64))])

From filter

init_from’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

k1 = KalmanFilter.init_random(3,4,3)
k2 = KalmanFilterSR.init_from(k1)
for p1, p2 in zip(k1.parameters(), k2.parameters()):
    test_close(p1,p2, eps=1e-3) #noise added by contraints

Get Info

source

KalmanFilterBase.get_info

 KalmanFilterBase.get_info ()

kB

Kalman Filter (3 obs, 4 state, 3 contr)

$A$

state	x_0	x_1	x_2	x_3
x_0	0.1751	0.5375	0.7676	0.7450
x_1	0.1204	0.4777	0.5823	0.3786
x_2	0.8484	0.2317	0.3969	0.7088
x_3	0.0270	0.6178	0.6097	0.9128

$Q$

state	x_0	x_1	x_2	x_3
x_0	0.9002	0.5074	0.0024	0.0558
x_1	0.5074	1.2712	0.0756	0.5690
x_2	0.0024	0.0756	0.9072	0.9246
x_3	0.0558	0.5690	0.9246	2.2208

$b$

state	offset
x_0	0.4661
x_1	0.3918
x_2	0.0571
x_3	0.9529

$H$

variable	x_0	x_1	x_2	x_3
y_0	0.4984	0.1597	0.4458	0.7349
y_1	0.1070	0.9531	0.1942	0.6683
y_2	0.9186	0.7123	0.1806	0.5045

$R$

variable	y_0	y_1	y_2
y_0	0.5856	0.0190	0.5991
y_1	0.0190	1.3107	0.5130
y_2	0.5991	0.5130	2.3748

$d$

variable	offset
y_0	0.9334
y_1	0.5645
y_2	0.0695

$B$

state	c_0	c_1	c_2
x_0	0.2827	0.1138	0.9378
x_1	0.5594	0.9364	0.5136
x_2	0.8592	0.7647	0.5183
x_3	0.2376	0.5618	0.5096

$m_0$

state	mean
x_0	0.8234
x_1	0.3850
x_2	0.3380
x_3	0.4376

$P_0$

state	x_0	x_1	x_2	x_3
x_0	0.8860	0.4535	0.4490	0.2316
x_1	0.4535	1.8632	0.6740	1.3448
x_2	0.4490	0.6740	2.0374	1.2929
x_3	0.2316	1.3448	1.2929	2.0978

Test data

reset_seed()
data, mask, control = get_test_data()
show_as_row(data, mask, control)

data

tensor([[[0.8775,    nan,    nan],
         [0.6706,    nan, 0.9272],
         [   nan, 0.4967,    nan],
         [   nan,    nan,    nan],
         [   nan,    nan, 0.4760],
         [0.7606, 0.7759, 0.5243],
         [0.3714, 0.0426, 0.2343],
         [0.9991, 0.1775,    nan],
         [0.6734,    nan, 0.6468],
         [0.5825, 0.4599, 0.7960]],

        [[0.9038, 0.9735, 0.6428],
         [0.3725, 0.2052,    nan],
         [0.4448, 0.5775, 0.7237],
         [0.5927,    nan, 0.6441],
         [   nan, 0.9132, 0.0329],
         [0.4856, 0.9927, 0.5895],
         [0.2611, 0.9413, 0.1371],
         [0.8726, 0.5590, 0.8451],
         [0.1253, 0.9434, 0.0462],
         [0.2360, 0.0239, 0.8950]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

control

tensor([[[0.9201, 0.6883, 0.5342],
         [0.7847, 0.3137, 0.1778],
         [0.5838, 0.9799, 0.3611],
         [0.3155, 0.7475, 0.5450],
         [0.5641, 0.2493, 0.8323],
         [0.9723, 0.1883, 0.3605],
         [0.5344, 0.3443, 0.7696],
         [0.3410, 0.7553, 0.3177],
         [0.0315, 0.5209, 0.6514],
         [0.3131, 0.4510, 0.3550]],

        [[0.4790, 0.0676, 0.3606],
         [0.7299, 0.6713, 0.3134],
         [0.7460, 0.1291, 0.4653],
         [0.5693, 0.9906, 0.8288],
         [0.9039, 0.5240, 0.6277],
         [0.3574, 0.0076, 0.6530],
         [0.8667, 0.9368, 0.8667],
         [0.6749, 0.3526, 0.6618],
         [0.0837, 0.7188, 0.7247],
         [0.3211, 0.4898, 0.9030]]], dtype=torch.float64)

Standard Kalman Filter

k = KalmanFilter.init_random(3,4,3)

Filter

Filter predict

Probability of state at time t given state a time t-1

$p(x_t) = \mathcal{N}(x_t; m_t^-, P_t^-)$ where:

predicted state mean: $m_t^- = Am_{t-1} + B c_t + b$
predicted state covariance: $P_t^- = AP_{t-1}A^T + Q$

A, Q, b, B, m_pr,P_pr= (k.A, k.Q, k.b, k.B,torch.concat([k.m0]*2), torch.concat([k.P0]*2))

m_pr.shape, P_pr.shape, A.shape

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]), torch.Size([1, 4, 4]))

Covariance

implement $P_t^- = AP_{t-1}A^T + Q$

P_m = _filter_predict_cov_stand(A, Q, P_pr)
P_m

tensor([[[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]],

        [[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]]], dtype=torch.float64,
       grad_fn=<AddBackward0>)

Mean

Predict

B.shape

torch.Size([1, 4, 3])

control.shape

torch.Size([2, 10, 3])

B[0].shape

torch.Size([4, 3])

B[0] @ control[0, 0].unsqueeze(-1)

tensor([[0.7217],
        [0.9572],
        [0.5742],
        [1.2138]], dtype=torch.float64, grad_fn=<MmBackward0>)

m_m, P_m = _filter_predict(
    A, Q, b, B,
    m_pr,P_pr, control[:, 0, :].unsqueeze(-1))

show_as_row(m_m, P_m)

m_m

tensor([[[2.4986],
         [2.2429],
         [1.8833],
         [3.6164]],

        [[2.1377],
         [1.5964],
         [1.5371],
         [2.8315]]], dtype=torch.float64, grad_fn=)

P_m

tensor([[[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]],

        [[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]]], dtype=torch.float64,
       grad_fn=)

(m_m.shape, P_m.shape,)

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

Filter update

Probability of state at time t given the observations at time t

$p(x_t|y_t) = \mathcal{N}(x_t; m_t, P_t)$ where:

predicted obs mean: $z_t = Hm_t^- + d$
prediced obs covariance: $S_t = HP_t^-H^T + R$
kalman gain$K_t = P_t^-H^TS_t^{-1}$
corrected state mean: $m_t = m_t^- + K_t(y_t - z_t)$
corrected state covariance: $P_t = (I-K_tH)P_t^-$

if the observation are missing this step is skipped and the corrected state is equal to the predicted state

Need to figure out the Nans for the gradients …

Kalman Gain

Don’t compute the inverse of the matrix, but use cholesky_solve to invert the matrix

H, d, R, obs = k.H, k.d, k.R, data[:,0,:].unsqueeze(-1)

K = _filter_update_k_gain(H, R, P_m)
K

tensor([[[ 0.1177,  0.2313,  0.0612],
         [-0.5573,  0.6582,  0.0288],
         [-0.3189,  0.5919, -0.0174],
         [ 0.3269,  0.3056, -0.0378]],

        [[ 0.1177,  0.2313,  0.0612],
         [-0.5573,  0.6582,  0.0288],
         [-0.3189,  0.5919, -0.0174],
         [ 0.3269,  0.3056, -0.0378]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)

test_close(_filter_update_k_gain(H, R, P_m), P_m @ H.mT @ torch.inverse(H @ P_m @ H.mT + R))

Covariance

P = _filter_update_cov(H, K, P_m)
P

tensor([[[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)

Mean

z = H @ m_m + d; z
(obs - z)

tensor([[[-3.3343],
         [    nan],
         [    nan]],

        [[-2.4181],
         [-5.0754],
         [-3.9145]]], dtype=torch.float64, grad_fn=<SubBackward0>)

m = _filter_update_mean(H, d, K, m_m, obs)
m

tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=<AddBackward0>)

m, P = _filter_update(H, d, R, m_m, P_m, obs)
show_as_row(m, P)
m.shape, P.shape

tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=)

tensor([[[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=)

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

there are nan in the output because there are nan in the observations

The next functions adds the support for missing obsevations by also using a mask

Missing observations

If all the observations at time $t$ are missing the correct step is skipped and the filtered state at time $t$ () is the same of the filtered state.

If only some observations are missing a variation of equation can be used.

$y^{ng}_t$ is a vector containing the observations that are not missing at time $t$.

It can be expressed as a linear transformation of $y_t$

\[ y^{ng}_t = My_t\]

where $M$ is a mask matrix that is used to select the subset of $y_t$ that is observed. $M \in \mathbb{R}^{n_{ng} \times n}$ and is made of columns which are made of all zeros but for an entry 1 at row corresponding to the non-missing observation. hence:

\[ p(y^{ng}_t) = \mathcal{N}(M\mu_{y_t}, M\Sigma_{y_t}M^T)\]

from which you can derive

\[ p(y^{ng}_t|x_t) = p(MHx_t + Mb, MRM^T) \tag{1}\]

Then the posterior $p(x_t|y_t^{ng})$ can be computed similarly of equation @filter_correct as:

\[ p(x_t|y^{ng}_t) = \mathcal{N}(x_t; m_t, P_t) \tag{2}\]

where:

predicted obs mean: $z_t = MHm_t^- + Md$
predicted obs covariance: $S_t = MHP_t^-(MH)^T + MRM^T$
Kalman gain $K_t = P_t^-(MH)^TS_t^{-1}$
corrected state mean: $m_t = m_t^- + K_t(My_t - z_t)$
corrected state covariance: $P_t = (I-K_tMH)P_t^-$

Details implementation

For the implementation the matrix multiplication $MH$ can be replaced with H[m] where m is the mask for the rows for H and $MRM^T$ with R[m][:,m]

H, R, d,obs, mm = k.H, k.R, k.d, data[:,0,:].unsqueeze(-1), mask[:,0,:].unsqueeze(-1)

m = torch.tensor([False,True,True]) # mask batch
M = torch.tensor([[[0,1,0], # mask matrix
                  [0,0,1]]], dtype=torch.float64)
show_as_row(m, M, H, R)

tensor([False,  True,  True])

tensor([[[0., 1., 0.],
         [0., 0., 1.]]], dtype=torch.float64)

Parameter containing:
tensor([[[0.1349, 0.0519, 0.1180, 0.9767],
         [0.1679, 0.8635, 0.3753, 0.9760],
         [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
       requires_grad=True)

tensor([[[1.6259, 1.1183, 0.8535],
         [1.1183, 1.2831, 0.9935],
         [0.8535, 0.9935, 2.4550]]], dtype=torch.float64,
       grad_fn=)

M @ M.mT

tensor([[[1., 0.],
         [0., 1.]]], dtype=torch.float64)

M @ H, H[:, m]

(tensor([[[0.1679, 0.8635, 0.3753, 0.9760],
          [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
        grad_fn=<UnsafeViewBackward0>),
 tensor([[[0.1679, 0.8635, 0.3753, 0.9760],
          [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
        grad_fn=<IndexBackward0>))

M @ R @ M.mT, R[:,m][:,:,m]

(tensor([[[1.2831, 0.9935],
          [0.9935, 2.4550]]], dtype=torch.float64,
        grad_fn=<UnsafeViewBackward0>),
 tensor([[[1.2831, 0.9935],
          [0.9935, 2.4550]]], dtype=torch.float64, grad_fn=<IndexBackward0>))

By using partially missing observations _filter_update cannot be easily batched as the shape of the intermediate variables depends on the number of observed variables. So the idea is to divide the batch in blocks that share the same number of variables missing.

mask_values, indices = torch.unique(mask[:,1,:], dim=0, return_inverse=True)
mask_values, indices

(tensor([[ True, False,  True],
         [ True,  True, False]]),
 tensor([0, 1]))

Update mask

mm = mask[0,0,:]

H[:, mm].shape, d[:, mm].shape, R[:, mm][:, :,mm].shape, obs[:, mm].shape

(torch.Size([1, 1, 4]),
 torch.Size([1, 1, 1]),
 torch.Size([1, 1, 1]),
 torch.Size([2, 1, 1]))

show_as_row(*_filter_update_mask(H, d, R, m_m, P_m, obs, mask[0, 0, :] ))

tensor([[[0.7184],
         [0.7151],
         [0.0696],
         [1.1526]],

        [[0.8467],
         [0.4883],
         [0.2217],
         [1.0446]]], dtype=torch.float64, grad_fn=)

tensor([[[2.3679, 0.4016, 0.8785, 0.4342],
         [0.4016, 1.9680, 1.3041, 0.4451],
         [0.8785, 1.3041, 1.8687, 0.4890],
         [0.4342, 0.4451, 0.4890, 1.0874]],

        [[2.3679, 0.4016, 0.8785, 0.4342],
         [0.4016, 1.9680, 1.3041, 0.4451],
         [0.8785, 1.3041, 1.8687, 0.4890],
         [0.4342, 0.4451, 0.4890, 1.0874]]], dtype=torch.float64,
       grad_fn=)

m, P = _filter_update_mask(H, d, R, m_m, P_m, obs, mask[0, 0, :] )
m.shape, P.shape

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

Update mask batch

m, P = _filter_update_mask_batch(H, d, R, m_m, P_m, obs, mask[:,0,:] )
show_as_row(m, P)
m.shape, P.shape

tensor([[[ 0.7184],
         [ 0.7151],
         [ 0.0696],
         [ 1.1526]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=)

tensor([[[ 2.3679,  0.4016,  0.8785,  0.4342],
         [ 0.4016,  1.9680,  1.3041,  0.4451],
         [ 0.8785,  1.3041,  1.8687,  0.4890],
         [ 0.4342,  0.4451,  0.4890,  1.0874]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=)

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

m.sum().backward(retain_graph=True) # check that pytorch can compute gradients with the whole batch and gradients aren't nan
H.grad

tensor([[[-1.1769, -2.7620, -0.9158, -2.4954],
         [-2.0431,  0.8269,  0.5074, -1.5867],
         [ 0.1472,  0.0285,  0.1012,  0.0356]]], dtype=torch.float64)

Filter All

The resursive version of the kalman filter is apperently breaking pytorch gradients calculations so a workaround is needed. During the loop the states are saved in a python list and then at the end they are combined back into a tensor. The last line of the function does:

convert lists to tensors
correct order dimensions

filt_state, pred_state  = k._filter_all(data, mask, control)

(ms, Ps), (m_ms, P_ms) = filt_state, pred_state

Predictions at time 0 for both batches

show_as_row(*map(Self.shape(), (m_ms, P_ms, ms, Ps,)))

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

show_as_row(*map(lambda x:x[0][0], (m_ms, P_ms, ms, Ps,)))

tensor([[0.6498],
        [0.3257],
        [0.8462],
        [0.7727]], dtype=torch.float64, grad_fn=)

tensor([[0.7511, 0.6229, 0.6414, 0.4561],
        [0.6229, 1.2138, 0.6974, 0.9224],
        [0.6414, 0.6974, 1.4388, 1.4051],
        [0.4561, 0.9224, 1.4051, 3.1638]], dtype=torch.float64,
       grad_fn=)

tensor([[0.6392],
        [0.3073],
        [0.8191],
        [0.7180]], dtype=torch.float64, grad_fn=)

tensor([[0.6695, 0.4821, 0.4340, 0.0368],
        [0.4821, 0.9707, 0.3394, 0.1987],
        [0.4340, 0.3394, 0.9115, 0.3391],
        [0.0368, 0.1987, 0.3391, 1.0092]], dtype=torch.float64,
       grad_fn=)

Filter

The filter methods wraps _filter_all but in addition:

returns only filtered state
remove last dimensions from mean

source

KalmanFilter.filter

 KalmanFilter.filter (obs:torch.Tensor, mask:torch.Tensor,
                      control:torch.Tensor)

Filter observation

	Type	Details
obs	Tensor	`([n_batches], n_obs, [self.n_dim_obs])` where `n_batches` and `n_dim_obs` dimensions can be omitted if 1
mask	Tensor	`([n_batches], n_obs, [self.n_dim_obs])` where `n_batches` and `n_dim_obs` dimensions can be omitted if 1
control	Tensor	([n_batches], n_obs, [self.n_dim_contr])
Returns	ListMultiNormal	Filtered state (n_batches, n_obs, self.n_dim_state)

filt = k.filter(data, mask, control)
filt.mean.shape, filt.cov.shape

(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))

Smooth

Smooth update step

compute the probability of the state at time t given all the observations

$p(x_t|Y) = \mathcal{N}(x_t; m_t^s, P_t^s)$ where:

Kalman smoothing gain: $G_t = P_tA^T(P_{t+1}^-)^{-1}$
smoothed mean: $m_t^s = m_t + G_t(m_{t+1}^s - m_{t+1}^-)$
smoothed covariance: $P_t^s = P_t + G_t(P_{t+1}^s - P_{t+1}^-)G_t^T$

test_close(_smooth_gain(A, filt_state.cov, pred_state.cov), filt_state.cov @ A.mT @ torch.inverse(pred_state.cov))

K_p = _smooth_gain(A, filt_state[:,0].cov, pred_state[:,0].cov)
K_p.shape

torch.Size([2, 4, 4])

_smooth_mean(K_p, filt_state[:,0].mean, pred_state[:,0].mean, filt_state[:,0].mean)

tensor([[[ 0.6254],
         [ 0.2922],
         [ 0.7965],
         [ 0.6964]],

        [[-0.1829],
         [-0.7088],
         [-0.0211],
         [ 0.2570]]], dtype=torch.float64, grad_fn=<AddBackward0>)

_smooth_cov(K_p, filt_state[:,0].cov, pred_state[:,0].cov, filt_state[:,0].cov)

tensor([[[ 0.5327,  0.3321,  0.2094, -0.1773],
         [ 0.3321,  0.8061,  0.0930, -0.0363],
         [ 0.2094,  0.0930,  0.5427, -0.0125],
         [-0.1773, -0.0363, -0.0125,  0.6738]],

        [[ 0.3342,  0.0671,  0.0652, -0.2001],
         [ 0.0671,  0.3478, -0.1115, -0.1245],
         [ 0.0652, -0.1115,  0.4549, -0.0025],
         [-0.2001, -0.1245, -0.0025,  0.6981]]], dtype=torch.float64,
       grad_fn=<DivBackward0>)

show_as_row(*_smooth_update(A, filt_state[:, 0], pred_state[:, 0], filt_state[:, 0]))

tensor([[[ 0.6254],
         [ 0.2922],
         [ 0.7965],
         [ 0.6964]],

        [[-0.1829],
         [-0.7088],
         [-0.0211],
         [ 0.2570]]], dtype=torch.float64, grad_fn=)

tensor([[[ 0.5327,  0.3321,  0.2094, -0.1773],
         [ 0.3321,  0.8061,  0.0930, -0.0363],
         [ 0.2094,  0.0930,  0.5427, -0.0125],
         [-0.1773, -0.0363, -0.0125,  0.6738]],

        [[ 0.3342,  0.0671,  0.0652, -0.2001],
         [ 0.0671,  0.3478, -0.1115, -0.1245],
         [ 0.0652, -0.1115,  0.4549, -0.0025],
         [-0.2001, -0.1245, -0.0025,  0.6981]]], dtype=torch.float64,
       grad_fn=)

show_as_row(*map(Self.shape(), _smooth_update(A, MNormal(m_m, P_m), MNormal(m_m, P_m), MNormal(m_m, P_m))))

torch.Size([2, 4, 1])

torch.Size([2, 4, 4])

Smooth

smooth_state = _smooth(k.A,  filt_state, pred_state)

show_as_row(smooth_state.mean[0][0], smooth_state.cov[0][0])

tensor([[-0.0785],
        [-0.5839],
        [-0.0959],
        [ 0.0031]], dtype=torch.float64, grad_fn=)

tensor([[ 0.4578,  0.2300,  0.1755, -0.1481],
        [ 0.2300,  0.6676,  0.0346, -0.0171],
        [ 0.1755,  0.0346,  0.5665,  0.0642],
        [-0.1481, -0.0171,  0.0642,  0.7652]], dtype=torch.float64,
       grad_fn=)

show_as_row(smooth_state.mean.shape, smooth_state.cov.shape)

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

KalmanFilter method

source

KalmanFilter.smooth

 KalmanFilter.smooth (obs:torch.Tensor, mask:torch.Tensor,
                      control:torch.Tensor)

Kalman Filter Smoothing

	Type	Details
obs	Tensor
mask	Tensor
control	Tensor
Returns	ListMultiNormal	`[n_timesteps, n_dim_state]` smoothed state

smoothed_state = k.smooth(data, mask, control)

show_as_row(smoothed_state.mean.shape, smoothed_state.cov.shape)

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

smoothed_state.mean.sum().backward(retain_graph=True)
A.grad

tensor([[[-4.9495, -5.2939, -7.8642,  0.2546],
         [ 0.2445,  8.3747, 13.5693, -7.4543],
         [-5.4811, -5.2497, -6.6092, -1.5187],
         [-1.5874,  2.1945,  3.6410, -2.3143]]], dtype=torch.float64)

Predict

The prediction at time t ($y_t$) are computed rom the state ($x_t$) using this formula: \[p(y_t|x_t) = \mathcal{N}(Hx_t + d, R + HP^s_tH^T)\]

this works both if the state was filtered or smoother

This add the supports for conditional predictions, which means that at the time (t) when we are making the predictions some of the variables have been actually observed. Since the model prediction is a normal distribution we can condition on the observed values and thus improve the predictions. See conditional_gaussian

In order to have conditional predictions that make sense it’s not possible to return the full covariance matrix for the predictions but only the standard deviations

test_m = torch.tensor(
    [[True, True, True,],
    [False, True, True],
    [False, False, False]]
)

torch.logical_xor(test_m.all(-1), test_m.any(-1))

tensor([False,  True, False])

A = torch.rand(2,2,3,3)

(A @ A).shape

torch.Size([2, 2, 3, 3])

predict can be vectorized across both the batch and the timesteps, except for timesteps that require conditional predictions

smoothed_state.mean.shape, smoothed_state.cov.shape

(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))

(k.H @ smoothed_state.mean).shape

torch.Size([2, 10, 3, 1])

pred_obs0 = k._obs_from_state(smoothed_state)
pred_obs0.mean.shape

torch.Size([2, 10, 3])

pred_obs0.cov.shape

torch.Size([2, 10, 3, 3])

source

KalmanFilter.predict

 KalmanFilter.predict (obs, mask, control, smooth=True)

Predicted observations at all times

pred = k.predict(data, mask, control)

pred.mean.shape, pred.std.shape

(torch.Size([2, 10, 3]), torch.Size([2, 10, 3]))

Gradients …

def get_grad_mask(x):
    "filter gradient after sub the masks value with x"
    d = data.clone()
    d[~mask] = x
    k.predict(data, mask, control).mean.sum().backward(retain_graph=True)
    grad = k.R_raw.grad.clone()
    k.zero_grad() 
    return grad

get_grad_mask(10)

tensor([[[ -2.5082,   0.0000,   0.0000],
         [-11.7667,  -0.1206,   0.0000],
         [ -7.6345,  -8.3390,  -4.0905]]], dtype=torch.float64)

test_close(get_grad_mask(1), get_grad_mask(10))

source

KalmanFilter.predict

 KalmanFilter.predict (obs, mask, control, smooth=True)

Predicted observations at all times

@patch
def predict_times(self: KalmanFilter, times, obs, mask=None, smooth=True, check_args=None):
    """Predicted observations at specific times """
    state = self.smooth(obs, mask, check_args) if smooth else self.filter(obs, mask, check_args)
    obs, mask = self._parse_obs(obs, mask)
    times = array1d(times)
    
    n_timesteps = obs.shape[0]
    n_features = obs.shape[1] if len(obs.shape) > 1 else 1
    
    if times.max() > n_timesteps or times.min() < 0:
        raise ValueError(f"provided times range from {times.min()} to {times.max()}, which is outside allowed range : 0 to {n_timesteps}")

    means = torch.empty((times.shape[0], n_features), dtype=obs.dtype, device=obs.device)
    stds = torch.empty((times.shape[0], n_features), dtype=obs.dtype, device=obs.device) 
    for i, t in enumerate(times):
        mean, std = self._obs_from_state(
            state.mean[t],
            state.cov[t],
            {'t': t, **check_args} if check_args is not None else None
        )
        
        means[i], stds[i] = _get_cond_pred(ListNormal(mean, std), obs[t], mask[t])
    
    return ListNormal(means, stds)

Numerical Stable Kalman Filter (Square Root Filter)

kSR = KalmanFilterSR.init_random(3,4,3)

Filter

Filter predict

Covariance

Implement the numerical stable version of the covariance update

A, Q, b, B, m_pr,P_pr= (k.A, k.Q, k.b, k.B,torch.concat([k.m0]*2), torch.concat([k.P0]*2))

Q_C = kSR.Q_C

_filter_predict_cov_stand(kSR.A, kSR.Q_C @ kSR.Q_C.mT, P_pr)

tensor([[[1.9504, 2.3535, 2.1727, 2.2936],
         [2.3535, 5.8436, 5.0464, 6.0831],
         [2.1727, 5.0464, 6.2940, 6.0969],
         [2.2936, 6.0831, 6.0969, 8.1538]],

        [[1.9504, 2.3535, 2.1727, 2.2936],
         [2.3535, 5.8436, 5.0464, 6.0831],
         [2.1727, 5.0464, 6.2940, 6.0969],
         [2.2936, 6.0831, 6.0969, 8.1538]]], dtype=torch.float64,
       grad_fn=<AddBackward0>)

P_pr_C = torch.linalg.cholesky(P_pr)

\[W = \begin{bmatrix}AC_{t-1}&C_Q\end{bmatrix}\]

W = torch.concat([A @ P_pr_C, Q_C.expand_as(P_pr_C)], dim=-1)
W.shape

torch.Size([2, 4, 8])

P_m_C = torch.linalg.qr(W.mT).R.mT

P_m_C

tensor([[[-2.5709,  0.0000,  0.0000,  0.0000],
         [-1.7904,  1.1590,  0.0000,  0.0000],
         [-1.9907,  1.3402,  1.2995,  0.0000],
         [-2.2985,  0.5614,  0.8125, -1.0728]],

        [[-2.5709,  0.0000,  0.0000,  0.0000],
         [-1.7904,  1.1590,  0.0000,  0.0000],
         [-1.9907,  1.3402,  1.2995,  0.0000],
         [-2.2985,  0.5614,  0.8125, -1.0728]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)

P_m_C @ P_m_C.mT

tensor([[[6.6096, 4.6031, 5.1178, 5.9093],
         [4.6031, 4.5489, 5.1174, 4.7660],
         [5.1178, 5.1174, 7.4476, 6.3838],
         [5.9093, 4.7660, 6.3838, 7.4094]],

        [[6.6096, 4.6031, 5.1178, 5.9093],
         [4.6031, 4.5489, 5.1174, 4.7660],
         [5.1178, 5.1174, 7.4476, 6.3838],
         [5.9093, 4.7660, 6.3838, 7.4094]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)

P_m = _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr)

test_close(P_m, P_m_C @ P_m_C.mT)

(P_m - P_m_C @ P_m_C.mT).max()

tensor(8.8818e-16, dtype=torch.float64, grad_fn=<MaxBackward1>)

test_P_m_C = torch.linalg.cholesky(P_m)

(test_P_m_C @ test_P_m_C.mT - P_m_C @ P_m_C.mT).max()

tensor(8.8818e-16, dtype=torch.float64, grad_fn=<MaxBackward1>)

(test_P_m_C - P_m_C).max()

tensor(5.1418, dtype=torch.float64, grad_fn=<MaxBackward1>)

Cholesky decomposition is not unique! but the solution is correct

P_m_C = _filter_predict_cov_SR(A, Q_C, P_pr_C)
test_close(P_m_C @ P_m_C.mT, _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr))

def fuzz_filter_predict_cov_SR(n=10):
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        A, Q_C, b, B, m_pr,P_pr = (kSR.A.unsqueeze(0), kSR.Q_C.unsqueeze(0), kSR.b.unsqueeze(-1),
                                                  kSR.B.unsqueeze(0),
                                                  torch.stack([kSR.m0]*2).unsqueeze(-1),
                                                  torch.stack([kSR.P0]*2))
        P_pr_C = torch.linalg.cholesky(P_pr)
        P_m_C = _filter_predict_cov_SR(A, Q_C, P_pr_C)
        test_close(P_m_C @ P_m_C.mT, _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr), eps=5e-13)

fuzz_filter_predict_cov_SR()

Predict

m_m, P_m_C = _filter_predict_SR(kSR.A, kSR.Q_C, kSR.b, kSR.B, m_pr, P_pr_C, control[:,0].unsqueeze(-1)) 
show_as_row(m_m, P_m_C)

m_m

tensor([[[1.9549],
         [2.4811],
         [3.3275],
         [3.5545]],

        [[1.1647],
         [2.3109],
         [2.4892],
         [2.8133]]], dtype=torch.float64, grad_fn=)

P_m_C

tensor([[[-1.3966,  0.0000,  0.0000,  0.0000],
         [-1.6852, -1.7331,  0.0000,  0.0000],
         [-1.5557, -1.3991,  1.3843,  0.0000],
         [-1.6423, -1.9130,  0.6253, -1.1858]],

        [[-1.3966,  0.0000,  0.0000,  0.0000],
         [-1.6852, -1.7331,  0.0000,  0.0000],
         [-1.5557, -1.3991,  1.3843,  0.0000],
         [-1.6423, -1.9130,  0.6253, -1.1858]]], dtype=torch.float64,
       grad_fn=)

is_posdef(P_m_C @ P_m_C.mT).all()

tensor(True)

Filter Update

def is_sr(x_C, x): return torch.allclose(x_C @ x_C.mT, x)

source

cat_2d

 cat_2d (x)

	Details
x	matrix as list of list of Tensor

x = [[]]

Example calculations

H, R, R_C, obs = kSR.H, kSR.R, kSR.R_C, data[:,0,:].unsqueeze(-1)
P_m = P_m_C @ P_m_C.mT

# use standard filter to compute expected result
S = H @ P_m @ H.mT + R
S_C = torch.linalg.cholesky(S)
K = _filter_update_k_gain(H, R, P_m)
K_bar = K @ S_C
P_stand = _filter_update_cov(H, K, P_m)
P_C_stand = torch.linalg.cholesky(P_stand)

assert all([is_sr(R_C, R), is_sr(S_C, S)])

\[M = \begin{bmatrix} R^{1/2} & H(P^-)^{1/2} \\ 0 & (P^-)^{1/2} \end{bmatrix}\]

M = cat_2d([[R_C.expand(2,-1,-1)              , H @ P_m_C], 
            [torch.zeros_like((H @ P_m_C).mT),  P_m_C]])

M[0]

tensor([[ 1.2478,  0.0000,  0.0000, -1.2662, -1.1753,  0.7242, -0.3448],
        [ 0.3190,  1.2801,  0.0000, -2.1349, -1.5926,  1.0952, -0.3843],
        [ 0.0341,  0.9601,  0.7766, -3.2461, -2.1290,  0.8250, -0.5179],
        [ 0.0000,  0.0000,  0.0000, -1.3966,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.6852, -1.7331,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.5557, -1.3991,  1.3843,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.6423, -1.9130,  0.6253, -1.1858]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)

\[V = \begin{bmatrix} S^{1/2} & 0 \\ \bar{K} & P^{1/2} \end{bmatrix}\]

# V from standard filter
V_stand = cat_2d([[S_C,   torch.zeros_like(K_bar.mT)],
                  [K_bar, P_C_stand]])

V_stand[0]

tensor([[ 2.2770,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 2.5904,  1.8631,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 3.2634,  2.2593,  1.3379,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.7766,  0.5205,  0.6151,  0.8355,  0.0000,  0.0000,  0.0000],
        [ 1.8316,  0.8659,  0.9168, -0.1000,  0.9427,  0.0000,  0.0000],
        [ 2.0275,  0.9734,  0.2654, -0.0859,  0.2527,  1.0461,  0.0000],
        [ 2.2790,  0.9606,  0.6923, -0.4813,  0.4183,  0.2013,  1.0540]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)

test_close(M @ M.mT, V_stand @ V_stand.mT)

V_sr = torch.linalg.qr(M.mT).R.mT

V_sr[0]

tensor([[-2.2770,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-2.5904, -1.8631,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-3.2634, -2.2593, -1.3379,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.7766, -0.5205, -0.6151,  0.8355,  0.0000,  0.0000,  0.0000],
        [-1.8316, -0.8659, -0.9168, -0.1000,  0.9427,  0.0000,  0.0000],
        [-2.0275, -0.9734, -0.2654, -0.0859,  0.2527, -1.0461,  0.0000],
        [-2.2790, -0.9606, -0.6923, -0.4813,  0.4183, -0.2013, -1.0540]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)

test_close(V_sr @ V_sr.mT, V_stand @ V_stand.mT)

n_dim_obs = R_C.shape[-1]
P_C = V_sr[:, n_dim_obs:, n_dim_obs:]
P_C

tensor([[[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=<SliceBackward0>)

is_sr(P_C, P_stand)

True

$P_C$ computed with the QR decomposition is not the same from the cholesky decomposition, but that’s okay! They are all valid square roots of the posterior covariance

(P_C == P_C_stand).all()

tensor(False)

(H @ P_m_C).shape

torch.Size([2, 3, 4])

Covariance

source

tensor_info

 tensor_info (x)

P_C, S_C = _filter_update_cov_SR(kSR.H, kSR.R_C, P_m_C)

def fuzz_filter_update_cov_SR(n=10):
    errs = []
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, R_C, P_m = (kSR.H, kSR.R_C, torch.cat([kSR.P0]*5))
        R = R_C @ R_C.mT
        P_m_C = torch.linalg.cholesky(P_m)
        P_C, _ = _filter_update_cov_SR(H, R_C, P_m_C)
        K = _filter_update_k_gain(H, R, P_m)
        P_stand = _filter_update_cov(H, K, P_m)
        errs.append((P_C @ P_C.mT -  P_stand).abs().max().item())
    return torch.tensor(errs)

err = fuzz_filter_update_cov_SR(100)
assert err.max() < torch.tensor(1e-10)
err.median(), err.max()

(tensor(2.8588e-15), tensor(1.0214e-14))

Kalman Gain

Don’t compute the inverse of the matrix, but use cholesky_solve to invert the matrix

S = kSR.H @ P_m_C @ P_m_C.mT @ kSR.H.mT + kSR.R

test_close(kSR.R, kSR.R_C @ kSR.R_C.mT)

test_close(S_C @ S_C.mT, S)

_filter_update_k_gain_SR(H, P_m_C, S_C)

tensor([[[-0.0014, -0.2782,  0.4598],
         [ 0.2389, -0.3662,  0.6852],
         [ 0.2854,  0.2819,  0.1984],
         [ 0.3866, -0.1119,  0.5174]],

        [[-0.0014, -0.2782,  0.4598],
         [ 0.2389, -0.3662,  0.6852],
         [ 0.2854,  0.2819,  0.1984],
         [ 0.3866, -0.1119,  0.5174]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)

test_close(_filter_update_k_gain(H, R, P_m), _filter_update_k_gain_SR(H, P_m_C, S_C))

def fuzz_kalman_gain_SR(n=10):
    errs = []
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, R_C, P_m = (kSR.H, kSR.R_C, torch.cat([kSR.P0]*5))
        R = R_C @ R_C.mT
        P_m_C = torch.linalg.cholesky(P_m)
        P_C, S_C = _filter_update_cov_SR(H, R_C, P_m_C)
        K_stand = _filter_update_k_gain(H, R, P_m)
        K = _filter_update_k_gain_SR(H, P_m_C, S_C)
        errs.append((K_stand - K).abs().max().item())
    return torch.tensor(errs)

err = fuzz_kalman_gain_SR(100)
assert err.max() < torch.tensor(1e-10)
err.median(), err.max()

(tensor(4.5797e-15), tensor(2.5202e-14))

Measurement update

m, P_C = _filter_update_SR(H, d, R_C, m_m, P_m_C, obs)
show_as_row(m, P_C)
m.shape, P_C.shape

tensor([[[   nan],
         [   nan],
         [   nan],
         [   nan]],

        [[0.1197],
         [0.3457],
         [0.5033],
         [0.6044]]], dtype=torch.float64, grad_fn=)

P_C

tensor([[[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=)

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

get_test_data(1, 5,4, bs=1)[1].shape

torch.Size([1, 1, 5])

def fuzz_filter_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, d, R, R_C, m_m, P_m = (kSR.H, kSR.d, kSR.R, kSR.R_C, torch.cat([kSR.m0]*5), torch.cat([kSR.P0]*5))
        obs = torch.randn_like(H @ m_m)
        P_m_C = torch.linalg.cholesky(P_m)
        mSR, P_m_C = _filter_update_SR(H, d, R_C, m_m, P_m_C, obs)
        m, P_C = _filter_update(H, d, R, m_m, P_m, obs)
        errs['mean'].append((mSR - m).abs().max().item())
        errs['cov'].append((P_C - P_m_C @ P_m_C.mT).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_update_SR(100)

err.median(), err.max()

(mean    1.049161e-14
 cov     2.740863e-15
 dtype: float64,
 mean    5.573320e-14
 cov     1.099121e-14
 dtype: float64)

Missing observations

Update mask

Here need to compute the square root of $R$, because cannot apply the mask to $R^{1/2}$

tensor([[[1.5571, 0.3980, 0.0426],
         [0.3980, 1.7403, 1.2398],
         [0.0426, 1.2398, 1.5260]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)

is_posdef(R)

tensor([True])

R_m = torch.tensor([[1.5571,  0.0426], [0.0426, 1.5259]])

R_m

tensor([[1.5571, 0.0426],
        [0.0426, 1.5259]])

is_posdef(R_m)

tensor(True)

m = [True, False, True]

is_posdef(R[:, m,:][:, :, m])

tensor([True])

H_m, d_m, R_m, R_C_m, obs_m, = H[:, m,:], d[:, m,:], R[:, m,:][:, :,m], R_C[:, m,:][:, :,m], obs[:, m]

R2 = R_m
R2_C_m = torch.linalg.cholesky(R_m)

is_sr(R_C_m, R_m)

False

_filter_update_SR(H_m, d_m, R_C_m, m_m, P_m_C, obs_m)[0] - _filter_update(H_m, d_m, R_m, m_m, P_m_C @ P_m_C.mT, obs_m)[0]

tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[-0.1152],
         [-0.1834],
         [-0.1472],
         [-0.1785]]], dtype=torch.float64, grad_fn=<SubBackward0>)

_filter_update_SR(H_m, d_m, R2_C_m, m_m, P_m_C, obs_m)[0] - _filter_update(H_m, d_m, R_m, m_m, P_m_C @ P_m_C.mT, obs_m)[0]

tensor([[[        nan],
         [        nan],
         [        nan],
         [        nan]],

        [[-4.4409e-16],
         [-4.4409e-16],
         [ 0.0000e+00],
         [-8.8818e-16]]], dtype=torch.float64, grad_fn=<SubBackward0>)

show_as_row(*_filter_update_mask_SR(H, d, R_C, m_m, P_m_C, obs, mask[0, 0, :] ))

tensor([[[1.3313],
         [1.0104],
         [1.6996],
         [1.7246]],

        [[0.7732],
         [1.3875],
         [1.4670],
         [1.6642]]], dtype=torch.float64, grad_fn=)

tensor([[[ 1.1441,  0.0000,  0.0000,  0.0000],
         [ 0.7349,  1.3175,  0.0000,  0.0000],
         [ 0.4355,  0.5900, -1.1768,  0.0000],
         [ 0.3596,  1.0472, -0.3473, -1.1330]],

        [[ 1.1441,  0.0000,  0.0000,  0.0000],
         [ 0.7349,  1.3175,  0.0000,  0.0000],
         [ 0.4355,  0.5900, -1.1768,  0.0000],
         [ 0.3596,  1.0472, -0.3473, -1.1330]]], dtype=torch.float64,
       grad_fn=)

m, P_C = _filter_update_mask_SR(H, d, R_C, m_m, P_m_C, obs, mask[0, 0, :] )
m.shape, P_C.shape

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

mask[0,0].shape

torch.Size([3])

def fuzz_filter_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, d, R, R_C, m_m, P_m = (kSR.H, kSR.d, kSR.R, kSR.R_C, torch.cat([kSR.m0]*5), torch.cat([kSR.P0]*5))
        obs, mask, _  = get_test_data(1, 5, 4, bs=5)
        obs, mask = obs[:,0].unsqueeze(-1), mask[0,0]
        
        P_m_C = torch.linalg.cholesky(P_m)
        mSR, P_m_C = _filter_update_mask_SR(H, d, R, m_m, P_m_C, obs, mask)
        m, P_C = _filter_update_mask(H, d, R, m_m, P_m, obs, mask)
        errs['mean'].append((mSR - m).abs().max().item())
        errs['cov'].append((P_C - P_m_C @ P_m_C.mT).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_update_SR(100)

err.median(), err.max()

(mean    3.996803e-15
 cov     1.970646e-15
 dtype: float64,
 mean    3.996803e-15
 cov     7.827072e-15
 dtype: float64)

Update mask batch

m, P_C = _filter_update_mask_batch_SR(H, d, R, m_m, P_m_C, obs, mask[:,0,:] )
show_as_row(m, P_C)
m.shape, P_C.shape

tensor([[[1.3685],
         [1.0982],
         [1.7967],
         [1.8337]],

        [[0.1197],
         [0.3457],
         [0.5033],
         [0.6044]]], dtype=torch.float64, grad_fn=)

P_C

tensor([[[ 1.1607,  0.0000,  0.0000,  0.0000],
         [ 0.8022,  1.3585,  0.0000,  0.0000],
         [ 0.5153,  0.6769, -1.2081,  0.0000],
         [ 0.4512,  1.1387, -0.3915, -1.1430]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=)

(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

m.sum().backward(retain_graph=True) # check that pytorch can compute gradients with the whole batch and gradients aren't nan
H.grad

tensor([[[-5.3359, -5.2561, -7.0879, -7.6131],
         [ 0.0176, -0.2548, -0.2340, -0.2578],
         [-0.2513, -0.9494, -1.2821, -1.5227]]], dtype=torch.float64)

Filter All

convert lists to tensors
correct order dimensions

filt_stateSR, pred_stateSR  = kSR._filter_all(data, mask, control)

(ms, P_Cs), (m_ms, P_m_Cs) = filt_stateSR, pred_stateSR

Predictions at time 0 for both batches

show_as_row(*map(Self.shape(), (m_ms, P_m_Cs, ms, P_Cs,)))

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

show_as_row(*map(lambda x:x[0][0], (m_ms, P_m_Cs, ms, P_Cs,)))

tensor([[0.4499],
        [0.8575],
        [0.2647],
        [0.4293]], dtype=torch.float64, grad_fn=)

tensor([[1.2562, 0.0000, 0.0000, 0.0000],
        [0.3804, 0.9666, 0.0000, 0.0000],
        [0.9536, 0.7909, 1.2431, 0.0000],
        [0.4487, 0.4993, 0.4600, 1.2599]], dtype=torch.float64,
       grad_fn=)

tensor([[0.5189],
        [0.9208],
        [0.4203],
        [0.5421]], dtype=torch.float64, grad_fn=)

tensor([[-1.1638,  0.0000,  0.0000,  0.0000],
        [-0.2344, -0.9142,  0.0000,  0.0000],
        [-0.5963, -0.5742, -1.1218,  0.0000],
        [-0.1702, -0.3039, -0.2622,  1.2088]], dtype=torch.float64,
       grad_fn=)

Filter

The filter methods wraps _filter_all but in addition:

returns only filtered state
remove last dimensions from mean

source

KalmanFilterSR.filter

 KalmanFilterSR.filter (obs:torch.Tensor, mask:torch.Tensor,
                        control:torch.Tensor)

Filter observation

	Type	Details
obs	Tensor	`([n_batches], n_obs, [self.n_dim_obs])` where `n_batches` and `n_dim_obs` dimensions can be omitted if 1
mask	Tensor	`([n_batches], n_obs, [self.n_dim_obs])` where `n_batches` and `n_dim_obs` dimensions can be omitted if 1
control	Tensor	([n_batches], n_obs, [self.n_dim_contr])
Returns	ListMultiNormal	Filtered state (n_batches, n_obs, self.n_dim_state)

filtSR = kSR.filter(data, mask, control)
filtSR.mean.shape, filtSR.cov.shape

(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))

def fuzz_filter_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(20, 5, 4)
        mean, cov = k.filter(*dat)
        meanSR, covSR = kSR.filter(*dat)
        covSR = covSR @ covSR.mT
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_SR(10)

err.median(), err.max()

(mean    8.108847e-12
 cov     8.149592e-12
 dtype: float64,
 mean    4.446576e-11
 cov     1.261309e-10
 dtype: float64)

Smooth

Smooth update step

compute the probability of the state at time t given all the observations

$p(x_t|Y) = \mathcal{N}(x_t; m_t^s, P_t^s)$ where:

Kalman smoothing gain: $G_t = P_tA^T(P_{t+1}^-)^{-1}$
smoothed mean: $m_t^s = m_t + G_t(m_{t+1}^s - m_{t+1}^-)$
smoothed covariance: $P_t^s = P_t + G_t(P_{t+1}^s - P_{t+1}^-)G_t^T$

filt_state = ListMNormal(filt_stateSR.mean, filt_stateSR.cov @ filt_stateSR.cov.mT)
pred_state = ListMNormal(pred_stateSR.mean, pred_stateSR.cov @ pred_stateSR.cov.mT)

K_p = _smooth_gain_SR(kSR.A, filt_stateSR[:, 0].cov, pred_stateSR[:, 0].cov)

test_close(
    _smooth_gain_SR(kSR.A, filt_stateSR[:, 0].cov, pred_stateSR[:, 0].cov),
    _smooth_gain(kSR.A, filt_state[:, 0].cov, pred_state[:, 0].cov)
)

test_close(filt_state[0,0].cov, filt_stateSR[0,0].cov @ filt_stateSR[0,0].cov.mT)

test_close(pred_state[0,0].cov, pred_stateSR[0,0].cov @ pred_stateSR[0,0].cov.mT)

def fuzz_gain_SR(n=10):
    errs = {'K': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(2, 5, 4)
        (_, f_cov), (_, p_cov) = k._filter_all(*dat)
        (_, f_covSR), (_, p_covSR) = kSR._filter_all(*dat)
        K = _smooth_gain(k.A, f_cov[:, 0], p_cov[:, 0])
        K_SR = _smooth_gain_SR(k.A, f_covSR[:, 0], p_covSR[:, 0])
        errs['K'].append((K - K_SR).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_gain_SR(10)

err.median(), err.max()

(K    1.920686e-14
 dtype: float64,
 K    4.352074e-14
 dtype: float64)

m_p, P_p = _smooth_update_SR(kSR.A, filt_stateSR[:, 0, :], pred_stateSR[:, 0, :], filt_stateSR[:, 0, :])

test_close(filt_state[:, 0, :].mean, filt_stateSR[:, 0, :].mean)

_smooth_update(kSR.A, filt_state[0,0],  pred_state[0,0] , filt_state[0,0] )

MultiNormal(mean=tensor([[[0.6325],
         [0.9899],
         [0.5680],
         [0.6284]]], dtype=torch.float64, grad_fn=<AddBackward0>), cov=tensor([[[ 0.7480, -0.0959, -0.0944, -0.2622],
         [-0.0959,  0.6667,  0.1856,  0.0380],
         [-0.0944,  0.1856,  0.9192, -0.0280],
         [-0.2622,  0.0380, -0.0280,  1.3021]]], dtype=torch.float64,
       grad_fn=<DivBackward0>))

_smooth_update_SR(kSR.A, filt_stateSR[0,0],  pred_stateSR[0,0], filt_stateSR[0,0])

MultiNormal(mean=tensor([[[0.6325],
         [0.9899],
         [0.5680],
         [0.6284]]], dtype=torch.float64, grad_fn=<AddBackward0>), cov=tensor([[[-1.8502, -2.4971, -5.2813, -3.0804],
         [-2.4971, -1.6908, -4.8194, -2.8887],
         [-5.2813, -4.8194, -9.7933, -6.0797],
         [-3.0804, -2.8887, -6.0797, -2.6434]]], dtype=torch.float64,
       grad_fn=<DivBackward0>))

test_close((filt_stateSR.mean, pred_stateSR.mean), (filt_state.mean, pred_state.mean))

test_close(
    _smooth_update_SR(kSR.A, filt_stateSR[:, 0], pred_stateSR[:, 0], filt_state[:, 0]),
    _smooth_update(kSR.A, filt_state[:, 0], pred_state[:, 0], filt_state[:, 0])
)

def fuzz_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(2, 5, 4)
        f_state, p_state = k._filter_all(*dat)
        f_stateSR, p_stateSR = kSR._filter_all(*dat)
        mean, cov = _smooth_update(k.A, f_state[:, 0], p_state[:, 0], f_state[:, 1])
        meanSR, covSR = _smooth_update_SR(k.A, f_stateSR[:, 0], p_stateSR[:, 0], f_state[:, 1])
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_update_SR(10)

err.median(), err.max()

(mean    1.154632e-14
 cov     2.771117e-13
 dtype: float64,
 mean    3.375078e-14
 cov     8.242296e-13
 dtype: float64)

Smooth

smooth_state = _smooth_SR(kSR.A,  filt_stateSR, pred_stateSR)

is_sr(filt_stateSR.cov, filt_state.cov)

True

s_mean, s_cov =  _smooth(kSR.A,  filt_state, pred_state)
s_meanSR, s_covSR =  _smooth_SR(kSR.A,  filt_stateSR, pred_stateSR)
test_close(s_cov, s_covSR)
(s_cov - s_covSR).median(), (s_cov - s_covSR).max()

(tensor(0., dtype=torch.float64, grad_fn=<MedianBackward0>),
 tensor(3.5527e-15, dtype=torch.float64, grad_fn=<MaxBackward1>))

show_as_row(smooth_state.mean[0][0], smooth_state.cov[0][0])

tensor([[-0.3279],
        [ 0.4029],
        [-0.7501],
        [-0.0098]], dtype=torch.float64, grad_fn=)

tensor([[ 0.9163, -0.0046,  0.0978, -0.1494],
        [-0.0046,  0.7132,  0.2841,  0.0932],
        [ 0.0978,  0.2841,  1.1262,  0.0916],
        [-0.1494,  0.0932,  0.0916,  1.3623]], dtype=torch.float64,
       grad_fn=)

show_as_row(smooth_state.mean.shape, smooth_state.cov.shape)

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

KalmanFilter method

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

torch.argwhere((~mask).any(-1).any(0)).min()

tensor(0)

torch.ones(1,2,0)

tensor([], size=(1, 2, 0))

source

KalmanFilterSR.smooth

 KalmanFilterSR.smooth (obs:torch.Tensor, mask:torch.Tensor,
                        control:torch.Tensor)

Kalman Filter Smoothing

	Type	Details
obs	Tensor
mask	Tensor
control	Tensor
Returns	ListMultiNormal	`[n_timesteps, n_dim_state]` smoothed state

smoothed_state = kSR.smooth(data, mask, control)

show_as_row(smoothed_state.mean.shape, smoothed_state.cov.shape)

torch.Size([2, 10, 4, 1])

torch.Size([2, 10, 4, 4])

smoothed_state.cov.isnan().any()

tensor(False)

smoothed_state.mean.sum().backward(retain_graph=True)
A.grad

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]], dtype=torch.float64)

smoothed_state_stand = k.smooth(data, mask, control)

(smoothed_state.mean - smoothed_state_stand.mean).max()
(smoothed_state.cov - smoothed_state_stand.cov).max()

tensor(0.6755, dtype=torch.float64, grad_fn=<MaxBackward1>)

def fuzz_smooth_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(20, 5, 4)
        mean, cov = k.smooth(*dat)
        meanSR, covSR = kSR.smooth(*dat)
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_smooth_SR(10)

err.median(), err.max()

(mean    9.733464e-12
 cov     8.838374e-12
 dtype: float64,
 mean    4.984400e-09
 cov     4.896766e-09
 dtype: float64)

Predict

predict can be vectorized across both the batch and the timesteps, except for timesteps that require conditional predictions

Obs from State

smoothed_state.mean.shape, smoothed_state.cov.shape

(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))

pred_obs0 = kSR._obs_from_state(smoothed_state)
pred_obs0.mean.shape, pred_obs0.cov.shape

(torch.Size([2, 10, 3, 1]), torch.Size([2, 10, 3, 3]))

pred_obs0.cov.isnan().any()

tensor(False)

gap_mask = ~mask.all(-1)

gap_mask.shape

torch.Size([2, 10])

Predict has various modes:

pred_only_gap is True, returns predictions only where the mask is False
- use_conditional returns a list (for each batch) of list (for each time stamp) of Tensors of shape [1, gap_len]
- use_conditional is False, returns a list (for each batch) of Tensor of shape [n_times_gap, n_dim_obs]

Masked Batch

gap_mask = ~mask.all(-1)

from pprint import pp

pp(_masked2batch(mask[gap_mask], mask)[0])

[tensor([False, False]),
 tensor([False]),
 tensor([False, False]),
 tensor([False, False, False]),
 tensor([False, False]),
 tensor([False]),
 tensor([False])]

str(_masked2batch(mask[gap_mask], mask)[0])

'[tensor([False, False]), tensor([False]), tensor([False, False]), tensor([False, False, False]), tensor([False, False]), tensor([False]), tensor([False])]'

show_as_row(all_mask = mask[0] , only_gap=_masked2batch(mask[gap_mask], mask)[0])

all_mask

tensor([[ True, False, False],
        [ True, False,  True],
        [False,  True, False],
        [False, False, False],
        [False, False,  True],
        [ True,  True,  True],
        [ True,  True,  True],
        [ True,  True, False],
        [ True, False,  True],
        [ True,  True,  True]])

only_gap

[tensor([False, False]),
 tensor([False]),
 tensor([False, False]),
 tensor([False, False, False]),
 tensor([False, False]),
 tensor([False]),
 tensor([False])]

Predict

source

KalmanFilterSR.predict

 KalmanFilterSR.predict (obs, mask, control, smooth=True)

Predicted observations at all times

Exploration

source

with_settings

 with_settings (k, **kwargs)

source

replacing_ctx

 replacing_ctx (*args)

with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean.shape, cov = pred_cov.shape)

mean

torch.Size([2, 10, 3])

cov

torch.Size([2, 10, 3, 3])

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean[0], cov = pred_cov[0])

mean

[tensor([-0.3666,  0.3765], dtype=torch.float64, grad_fn=),
 tensor([-0.0844], dtype=torch.float64, grad_fn=),
 tensor([0.1625, 0.3119], dtype=torch.float64, grad_fn=),
 tensor([-0.1102, -0.5474, -0.1901], dtype=torch.float64,
       grad_fn=),
 tensor([-0.1439, -0.4996], dtype=torch.float64, grad_fn=),
 tensor([0.4966], dtype=torch.float64, grad_fn=),
 tensor([0.0202], dtype=torch.float64, grad_fn=)]

cov

[tensor([[2.5518, 2.1682],
        [2.1682, 2.8766]], dtype=torch.float64, grad_fn=),
 tensor([[2.3344]], dtype=torch.float64, grad_fn=),
 tensor([[1.9095, 0.5371],
        [0.5371, 3.0221]], dtype=torch.float64, grad_fn=),
 tensor([[2.0929, 1.1465, 0.8395],
        [1.1465, 2.8696, 2.5367],
        [0.8395, 2.5367, 3.5092]], dtype=torch.float64,
       grad_fn=),
 tensor([[1.9076, 0.8393],
        [0.8393, 2.3729]], dtype=torch.float64, grad_fn=),
 tensor([[2.7991]], dtype=torch.float64, grad_fn=),
 tensor([[2.2841]], dtype=torch.float64, grad_fn=)]

with with_settings(kSR, use_conditional=True, pred_only_gap=True):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean[0], cov = pred_cov[0])

mean

[tensor([0.0292, 0.6235], dtype=torch.float64, grad_fn=),
 tensor([0.4156], dtype=torch.float64, grad_fn=),
 tensor([0.3672, 0.7999], dtype=torch.float64, grad_fn=),
 tensor([-0.1102, -0.5474, -0.1901], dtype=torch.float64,
       grad_fn=),
 tensor([-0.0686, -0.1575], dtype=torch.float64, grad_fn=),
 tensor([0.4823], dtype=torch.float64, grad_fn=),
 tensor([0.1720], dtype=torch.float64, grad_fn=)]

cov

[tensor([[2.5518, 2.1682],
        [2.1682, 2.8766]], dtype=torch.float64, grad_fn=),
 tensor([[2.3344]], dtype=torch.float64, grad_fn=),
 tensor([[1.9095, 0.5371],
        [0.5371, 3.0221]], dtype=torch.float64, grad_fn=),
 tensor([[2.0929, 1.1465, 0.8395],
        [1.1465, 2.8696, 2.5367],
        [0.8395, 2.5367, 3.5092]], dtype=torch.float64,
       grad_fn=),
 tensor([[1.9076, 0.8393],
        [0.8393, 2.3729]], dtype=torch.float64, grad_fn=),
 tensor([[2.7991]], dtype=torch.float64, grad_fn=),
 tensor([[2.2841]], dtype=torch.float64, grad_fn=)]

with with_settings(kSR, use_conditional = True, pred_only_gap = False):
    test_fail(kSR.predict, [data, mask, control]) # this params combination is invalid

Gap only prediction

copy paste from other notebook to make visualization easier

source

buffer_pred_single

 buffer_pred_single (preds:list[torch.Tensor], masks:torch.Tensor)

For predictions are for gaps only add buffer of Nan so they have same shape of targets

source

buffer_pred

 buffer_pred (preds:list[list[torch.Tensor]], masks:torch.Tensor)

For predictions are for gaps only add buffer of Nan so they have same shape of targets

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_m_gap, _ = kSR.predict(data, mask, control)
    
with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_m, _ = kSR.predict(data, mask, control)

soooooooo this is a problem!!! those should be the same

show_as_row(gap = buffer_pred(pred_m_gap, mask), no_gap = pred_m)

gap

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

no_gap

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)

state = kSR.smooth(data, mask, control)

state.mean.shape

torch.Size([2, 10, 4, 1])

gap_mask = ~mask.all(-1)
# this destroy batches! so need to do some magic after
state_gap = state[gap_mask]

state_gap.mean.shape

torch.Size([10, 4, 1])

pred_obs = kSR._obs_from_state(state)
pred_obs.mean.squeeze_(-1)

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=<SqueezeBackward3>)

pred_obs_gap = kSR._obs_from_state(state_gap)
pred_obs_gap.mean.squeeze_(-1)

tensor([[ 0.0322, -0.3666,  0.3765],
        [ 0.1475, -0.0844,  0.4320],
        [ 0.1625, -0.0796,  0.3119],
        [-0.1102, -0.5474, -0.1901],
        [-0.1439, -0.4996,  0.0158],
        [ 0.2642,  0.0727,  0.4966],
        [ 0.2667,  0.0202,  0.5999],
        [ 0.0572, -0.2074,  0.2059],
        [-0.0671, -0.3611,  0.1202],
        [ 0.1032, -0.2193, -0.1928]], dtype=torch.float64,
       grad_fn=<SqueezeBackward3>)

(pred_obs[gap_mask].mean == pred_obs_gap.mean).all() # so far good

tensor(True)

pred_obs.mean.shape

torch.Size([2, 10, 3])

mask.shape

torch.Size([2, 10, 3])

show_as_row(pred = buffer_pred(_masked2batch(pred_obs_gap.mean, mask), mask), mask = mask, all = pred_obs.mean)

pred

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

all

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)

pred_gap_buff = buffer_pred(_masked2batch(pred_obs_gap.mean, mask), mask)
mask_na = ~pred_gap_buff.isnan()

test_close(pred_gap_buff[mask_na], pred_obs.mean[mask_na])

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_gap_buff = buffer_pred(kSR.predict(data, mask, control).mean, mask)
mask_na = ~pred_gap_buff.isnan()
with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_ng = kSR.predict(data, mask, control).mean

test_close(pred_gap_buff[mask_na], pred_ng[mask_na])

show_as_row(pred = pred_gap_buff, mask = mask, all = pred_ng)

pred

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

all

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)

Conditional

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_gap = buffer_pred(kSR.predict(data, mask, control).mean, mask)
with with_settings(kSR, use_conditional=True, pred_only_gap=True):
    pred_gap_cond = buffer_pred(kSR.predict(data, mask, control).mean, mask)

show_as_row(no_conditional = pred_gap, conditional = pred_gap_cond)

no_conditional

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

conditional

tensor([[[    nan,  0.0292,  0.6235],
         [    nan,  0.4156,     nan],
         [ 0.3672,     nan,  0.7999],
         [-0.1102, -0.5474, -0.1901],
         [-0.0686, -0.1575,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4823],
         [    nan,  0.1720,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.5191],
         [    nan,     nan,     nan],
         [    nan,  0.2014,     nan],
         [ 0.6513,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

mask[gap_mask]

tensor([[ True, False, False],
        [ True, False,  True],
        [False,  True, False],
        [False, False, False],
        [False, False,  True],
        [ True,  True, False],
        [ True, False,  True],
        [ True,  True, False],
        [ True, False,  True],
        [False,  True,  True]])

data[gap_mask]

tensor([[0.8775,    nan,    nan],
        [0.6706,    nan, 0.9272],
        [   nan, 0.4967,    nan],
        [   nan,    nan,    nan],
        [   nan,    nan, 0.4760],
        [0.9991, 0.1775,    nan],
        [0.6734,    nan, 0.6468],
        [0.3725, 0.2052,    nan],
        [0.5927,    nan, 0.6441],
        [   nan, 0.9132, 0.0329]], dtype=torch.float64)

data[gap_mask]

tensor([[0.8775,    nan,    nan],
        [0.6706,    nan, 0.9272],
        [   nan, 0.4967,    nan],
        [   nan,    nan,    nan],
        [   nan,    nan, 0.4760],
        [0.9991, 0.1775,    nan],
        [0.6734,    nan, 0.6468],
        [0.3725, 0.2052,    nan],
        [0.5927,    nan, 0.6441],
        [   nan, 0.9132, 0.0329]], dtype=torch.float64)

assert is_posdef(pred_cov[0][0]).all()

state = kSR.smooth(data, mask, control)
assert not state.cov.isnan().any()
kSR.use_sr_pred = False
pred_obs = kSR._obs_from_state(state)
assert not pred_obs.cov.isnan().any()

cov2std(pred_obs.cov).isnan().any()

tensor(False)

assert not kSR.predict(data, mask, control).cov.isnan().any()

assert not kSR.predict(data, mask, control, smooth=False).cov.isnan().any()

kSR._predict_filter(data, mask, control).std.shape

torch.Size([2, 10, 3, 3])

kSR.use_conditional = False

pred = kSR.predict(data, mask, control, smooth=True)

pred.mean.shape, pred.std.shape

AttributeError: 'ListMultiNormal' object has no attribute 'std'

filt_state, pred_state = kSR._filter_all(data, mask, control)
mean, cov = kSR._obs_from_state(pred_state)

is_posdef(cov @ cov.mT)

pred

Debug nan

import polars as pl
import altair as alt

kSR.predict(*get_test_data(200)).cov.isnan().any()

data.shape

reset_seed()

SR Filter

nan = [{'nan':kSR.predict(*get_test_data(n), smooth=smooth).cov.isnan().any().item(), 'n': n, 'rep': rep, 'smooth': smooth}
       for n in [10, 20, 30, 40, 50, 55, 60, 70, 100, 150, 200] for rep in range(10) for smooth in [True, False]]

nan_df = pl.DataFrame(nan).groupby(['nan', 'n', 'smooth']).count().to_pandas()

alt.Chart(nan_df).mark_line().encode(x='n:Q', y='count', color='nan', column='smooth')

Standard Filter

k = KalmanFilter.init_from(kSR)

nan = [{'nan':k.predict(*get_test_data(n), smooth=smooth).std.isnan().any().item(), 'n': n, 'rep': rep, 'smooth': smooth}
       for n in [10, 20, 30, 40, 50, 55, 60, 70, 100, 150, 200] for rep in range(10) for smooth in [True, False]]

nan_df = pl.DataFrame(nan).groupby(['nan', 'n', 'smooth']).count().to_pandas()

alt.Chart(nan_df).mark_line().encode(x='n:Q', y='count', color='nan', column='smooth')

so the standard filter is working better than the SR for the smoothing (with this parameter setting), so there is a way to make the sr smoother not that bad

But I just want to see how we have nan

So the problem is that we have a negative number on the diagonal … so is not positive definite and even the standard deviation is nan

for i in range(20):
    dat = get_test_data(10)
    pred = kSR.predict(*dat)
    if pred.cov.isnan().any():
        print(i)
        break

k = KalmanFilter.init_from(kSR)

for p1, p2 in zip(k.parameters(), kSR.parameters()):
    test_close(p1,p2)

f_state_stand = k.filter(*dat)
s_state_stand = k.smooth(*dat)
pred_stand = k.predict(*dat)

pred_stand[1, -1].std

(f_state_stand.mean - filt_state.mean).mean()

filt_state = kSR.filter(*dat)

s_state = kSR.smooth(*dat)

is_posdef(s_state.cov)

is_posdef(s_state[1, -1].cov)

filt_state[1,-1].cov

kSR.H @ s_state[1, -1].cov @ kSR.H.mT

kSR.use_sr_pred = False
pred_obs = kSR._obs_from_state(s_state)
pred_obs[1,-1].cov

kSR.predict(*dat).

pred.std[1,-1]

Additional

Constructors

Simple parameters

init_simple’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

KalmanFilter.init_simple(2).state_dict()

Local slope

Local slope models are an extentions of local level model that in the state variable keep track of also the slope

Given $n$ as the number of dimensions of the observations

The transition matrix (A) is:

\[A = \left[\begin{array}{cc}I & I \\ 0 & I\end{array}\right]\]

where:

$I \in \mathbb{R}^{n \times n}$
$A \in \mathbb{R}^{2n \times 2n}$

the state $x \in \mathbb{R}^{2N \times 1}$ where the upper half keep track of the level and the lower half of the slope. $A \in \mathbb{R}^2N \times 2N$

the observation matrix (H) is:

\[H = \left[\begin{array}{cc}I & 0 \end{array}\right]\]

For the multivariate case the 1 are replaced with an identiy matrix

assuming that the control has the same dimensions of the observations then if we are doing a local slope model we have $B \in \mathbb{R}^{state \times contr}$: \[ B = \begin{bmatrix} -I & I \\ 0 & 0 \end{bmatrix}\]

init_local_slope_pca’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

KalmanFilter.init_local_slope_pca(2,2,pd.DataFrame([[1,2], [2,4]])).state_dict()